35 research outputs found

    Argumentation Mining in Parliamentary Discourse

    Get PDF
    In parliamentary discourse, politicians expound their beliefs and goals through argumentation, and, to persuade the audience, they communicate their values by highlighting some aspect of an issue, an action which is commonly known as framing. The choices of frames are typically dependent upon the speaker’s ideology. In this proposed doctoral work, we will computationally analyze framing strategies and present a model for discovering the latent structure of framing of real-world issues in Canadian parliamentary discourse

    Reply to Commentary on “Argumentation Mining in Parliamentary Discourse”

    Get PDF

    Automated Extraction of Protein Mutation Impacts from the Biomedical Literature

    Get PDF
    Mutations as sources of evolution have long been the focus of attention in the biomedical literature. Accessing the mutational information and their impacts on protein properties facilitates research in various domains, such as enzymology and pharmacology. However, manually reading through the rich and fast growing repository of biomedical literature is expensive and time-consuming. A number of manually curated databases, such as BRENDA (http://www.brenda-enzymes.org), try to index and provide this information; yet the provided data seems to be incomplete. Thus, there is a growing need for automated approaches to extract this information. In this work, we present a system to automatically extract and summarize impact information from protein mutations. Our system extraction module is split into subtasks: organism analysis, mutation detection, protein property extraction and impact analysis. Organisms, as sources of proteins, are required to be extracted to help disambiguation of genes and proteins. Thus, our system extracts and grounds organisms to NCBI. We detect mutation series to correctly ground our detected impacts. Our system also extracts the affected protein properties as well as the magnitude of the effects. The output of our system is populated to an OWL-DL ontology, which can then be queried to provide structured information. The performance of the system is evaluated on both external and internal corpora and databases. The results show the reliability of the approaches. Our Organism extraction system achieves a precision and recall of 95% and 94% and a grounding accuracy of 97.5% on the OT corpus. On the manually annotated corpus of Linneaus-100, the results show a precision and recall of 99% and 97% and grounding with an accuracy of 97.4%. In the impact detection task, our system achieves a precision and recall of 70.4%-71.8% and 71.2%-71.3% on a manually annotated documents. Our system grounds the detected impacts with an accuracy of 70.1%-71.7% on the manually annotated documents and a precision and recall of 57%-57.5% and 82.5%-84.2% against the BRENDA data

    Automated extraction and semantic analysis of mutation impacts from the biomedical literature

    Get PDF
    BACKGROUND: Mutations as sources of evolution have long been the focus of attention in the biomedical literature. Accessing the mutational information and their impacts on protein properties facilitates research in various domains, such as enzymology and pharmacology. However, manually curating the rich and fast growing repository of biomedical literature is expensive and time-consuming. As a solution, text mining approaches have increasingly been deployed in the biomedical domain. While the detection of single-point mutations is well covered by existing systems, challenges still exist in grounding impacts to their respective mutations and recognizing the affected protein properties, in particular kinetic and stability properties together with physical quantities. RESULTS: We present an ontology model for mutation impacts, together with a comprehensive text mining system for extracting and analysing mutation impact information from full-text articles. Organisms, as sources of proteins, are extracted to help disambiguation of genes and proteins. Our system then detects mutation series to correctly ground detected impacts using novel heuristics. It also extracts the affected protein properties, in particular kinetic and stability properties, as well as the magnitude of the effects and validates these relations against the domain ontology. The output of our system can be provided in various formats, in particular by populating an OWL-DL ontology, which can then be queried to provide structured information. The performance of the system is evaluated on our manually annotated corpora. In the impact detection task, our system achieves a precision of 70.4%-71.1%, a recall of 71.3%-71.5%, and grounds the detected impacts with an accuracy of 76.5%-77%. The developed system, including resources, evaluation data and end-user and developer documentation is freely available under an open source license at http://www.semanticsoftware.info/open-mutation-miner. CONCLUSION: We present Open Mutation Miner (OMM), the first comprehensive, fully open-source approach to automatically extract impacts and related relevant information from the biomedical literature. We assessed the performance of our work on manually annotated corpora and the results show the reliability of our approach. The representation of the extracted information into a structured format facilitates knowledge management and aids in database curation and correction. Furthermore, access to the analysis results is provided through multiple interfaces, including web services for automated data integration and desktop-based solutions for end user interactions

    OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents

    Get PDF
    Motivation: Semantic tagging of organism mentions in full-text articles is an important part of literature mining and semantic enrichment solutions. Tagged organism mentions also play a pivotal role in disambiguating other entities in a text, such as proteins. A high-precision organism tagging system must be able to detect the numerous forms of organism mentions, including common names as well as the traditional taxonomic groups: genus, species and strains. In addition, such a system must resolve abbreviations and acronyms, assign the scientific name and if possible link the detected mention to the NCBI Taxonomy database for further semantic queries and literature navigation. Results: We present the OrganismTagger, a hybrid rule-based/machine learning system to extract organism mentions from the literature. It includes tools for automatically generating lexical and ontological resources from a copy of the NCBI Taxonomy database, thereby facilitating system updates by end users. Its novel ontology-based resources can also be reused in other semantic mining and linked data tasks. Each detected organism mention is normalized to a canonical name through the resolution of acronyms and abbreviations and subsequently grounded with an NCBI Taxonomy database ID. In particular, our system combines a novel machine-learning approach with rule-based and lexical methods for detecting strain mentions in documents. On our manually annotated OT corpus, the OrganismTagger achieves a precision of 95%, a recall of 94% and a grounding accuracy of 97.5%. On the manually annotated corpus of Linnaeus-100, the results show a precision of 99%, recall of 97% and grounding accuracy of 97.4%. Availability: The OrganismTagger, including supporting tools, resources, training data and manual annotations, as well as end user and developer documentation, is freely available under an open-source license at http://www.semanticsoftware.info/organism-tagger. Contact: [email protected]

    Named entity recognition in chemical patents using ensemble of contextual language models

    Full text link
    Chemical patent documents describe a broad range of applications holding key reaction and compound information, such as chemical structure, reaction formulas, and molecular properties. These informational entities should be first identified in text passages to be utilized in downstream tasks. Text mining provides means to extract relevant information from chemical patents through information extraction techniques. As part of the Information Extraction task of the Cheminformatics Elsevier Melbourne University challenge, in this work we study the effectiveness of contextualized language models to extract reaction information in chemical patents. We assess transformer architectures trained on a generic and specialised corpora to propose a new ensemble model. Our best model, based on a majority ensemble approach, achieves an exact F1-score of 92.30% and a relaxed F1-score of 96.24%. The results show that ensemble of contextualized language models can provide an effective method to extract information from chemical patents

    Text mining processing pipeline for semi structured data D3.3

    Get PDF
    Unstructured and semi-structured cohort data contain relevant information about the health condition of a patient, e.g., free text describing disease diagnoses, drugs, medication reasons, which are often not available in structured formats. One of the challenges posed by medical free texts is that there can be several ways of mentioning a concept. Therefore, encoding free text into unambiguous descriptors allows us to leverage the value of the cohort data, in particular, by facilitating its findability and interoperability across cohorts in the project.Named entity recognition and normalization enable the automatic conversion of free text into standard medical concepts. Given the volume of available data shared in the CINECA project, the WP3 text mining working group has developed named entity normalization techniques to obtain standard concepts from unstructured and semi-structured fields available in the cohorts. In this deliverable, we present the methodology used to develop the different text mining tools created by the dedicated SFU, UMCG, EBI, and HES-SO/SIB groups for specific CINECA cohorts

    Computational Analysis of Arguments and Persuasive Strategies in Political Discourse

    No full text
    Various persuasive strategies are employed in advancing argumentation. This dissertation presents the first computational work in analyzing persuasive strategies in monological and dialogical argumentation in natural language. I begin with reputation defence strategies and show to what extent human annotators agree on these strategies. I present the first manually annotated corpus of parliamentary debates annotated with the most agreed upon face-saving strategies and show that linguistic features automatically extracted from the text of debates can differentiate between these strategies. Having shown the effectiveness of discourse parsing features in the classification of reputation defence strategies, I hypothesize that by directly using the effective features for discourse parsing, the classification results can be improved. My experiments validate this hypothesis and show that the developed methods can automatically label speeches with these strategies. I then explore whether we can automatically predict the language of face-saving in speeches and show that by leveraging the contextual information of the speeches, we can reliably distinguish between reputation defence from non-defence. I further investigate whether we can automatically classify statements in face-threatening and face-saving speeches based on truthfulness using the effective linguistic features introduced in the prior literature and show that while some of these features help identify the expression of dodge, they are not very effective in identifying the truthfulness of the statements. I further operationalize framing analysis as a classification task and show that neural language models can capture the abstract representations of frames more effectively. My experiments also show that frames are transferable across genres. Finally, in collaboration with several researchers, we examine to what extent expert and lay annotators can evaluate argumentation aspects, and show that the agreement of both groups is limited.Ph.D

    DS4DH participation in the TREC health misinformation track 2021

    No full text
    This notebook paper describes the participation of the Data Science 4 Digital Health (DS4DH) group at the TREC Health Misinformation Track 2021. We submitted 7 runs to the AdHoc Web Retrieval task. Our approach includes 3 steps. First, to estimate the usefulness, we generated an initial document list using statistical and neural ranking approaches. Second, we estimated the supportiveness and credibility with transfer learning. Third, we merged the usefulness, supportiveness and credibility scores to re-rank the initial list. To produce the usefulness score, we used a combination of a BM25 ranking with three transformed-based language models trained on the MS MARCO corpus. To estimate the supportiveness score, we applied pretrained and finetuned BERT-based language models. To calculate the credibility score, we used the random forest model in combination with a list of credible sites. In the end, we use the rank fusion to merge the scores. Our approach achieves performance similar to the overall median of the participants
    corecore